finite set
Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling
Learning the minimum/maximum mean among a finite set of distributions is a fundamental sub-problem in planning, game tree search and reinforcement learning. We formalize this learning task as the problem of sequentially testing how the minimum mean among a finite set of distributions compares to a given threshold. We develop refined non-asymptotic lower bounds, which show that optimality mandates very different sampling behavior for a low vs high true minimum. We show that Thompson Sampling and the intuitive Lower Confidence Bounds policy each nail only one of these cases. We develop a novel approach that we call Murphy Sampling. Even though it entertains exclusively low true minima, we prove that MS is optimal for both possibilities. We then design advanced self-normalized deviation inequalities, fueling more aggressive stopping rules. We complement our theoretical guarantees by experiments showing that MS works best in practice.
- North America > United States > Colorado > Boulder County > Boulder (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States (0.14)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Sequential Transfer in Multi-armed Bandit with Finite Set of Models
Learning from prior tasks and transferring that experience to improve future performance is critical for building lifelong learning agents. Although results in supervised and reinforcement learning show that transfer may significantly improve the learning performance, most of the literature on transfer is focused on batch learning tasks. In this paper we study the problem of sequential transfer in online learning, notably in the multi-arm bandit framework, where the objective is to minimize the cumulative regret over a sequence of tasks by incrementally transferring knowledge from prior tasks. We introduce a novel bandit algorithm based on a method-of-moments approach for the estimation of the possible tasks and derive regret bounds for it.
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.42)
Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling
Learning the minimum/maximum mean among a finite set of distributions is a fundamental sub-problem in planning, game tree search and reinforcement learning. We formalize this learning task as the problem of sequentially testing how the minimum mean among a finite set of distributions compares to a given threshold. We develop refined non-asymptotic lower bounds, which show that optimality mandates very different sampling behavior for a low vs high true minimum. We show that Thompson Sampling and the intuitive Lower Confidence Bounds policy each nail only one of these cases. We develop a novel approach that we call Murphy Sampling. Even though it entertains exclusively low true minima, we prove that MS is optimal for both possibilities. We then design advanced self-normalized deviation inequalities, fueling more aggressive stopping rules. We complement our theoretical guarantees by experiments showing that MS works best in practice.
From Ethical Declarations to Provable Independence: An Ontology-Driven Optimal-Transport Framework for Certifiably Fair AI Systems
Bhattacharya, Sukriti, Majumdar, Chitro
This paper presents a framework for provably fair AI that overcomes the limits of current bias mitigation methods by systematically removing all sensitive information and its proxies. Using ontology engineering in OWL 2 QL, it formally defines sensitive attributes and infers their proxies through logical reasoning, constructing a sigma algebra G that captures the full structure of biased patterns. Fair representations are then obtained via Delbaen Majumdar optimal transport, which generates variables independent of G while minimizing L2 distance to preserve accuracy. This guarantees true independence rather than mere decorrelation. By modeling bias as dependence between sigma algebras, compiling ontological knowledge into measurable structures, and using optimal transport as the unique fair transformation, the approach ensures complete fairness in tasks like loan approval, where proxies such as ZIP code reveal race. The result is a certifiable and mathematically grounded method for trustworthy AI.
- North America > United States (0.14)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Europe > Spain > Basque Country > Biscay Province > Bilbao (0.04)
- Asia > Middle East > UAE > Dubai Emirate > Dubai (0.04)
- Law (1.00)
- Health & Medicine (0.68)
- Banking & Finance > Insurance (0.46)
- North America > United States (0.14)
- North America > Canada (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Sequential Transfer in Multi-armed Bandit with Finite Set of Models
Learning from prior tasks and transferring that experience to improve future performance is critical for building lifelong learning agents. Although results in supervised and reinforcement learning show that transfer may significantly improve the learning performance, most of the literature on transfer is focused on batch learning tasks. In this paper we study the problem of sequential transfer in online learning, notably in the multi-arm bandit framework, where the objective is to minimize the cumulative regret over a sequence of tasks by incrementally transferring knowledge from prior tasks. We introduce a novel bandit algorithm based on a method-of-moments approach for the estimation of the possible tasks and derive regret bounds for it.
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.78)